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Abstract 

A new code, named MAP, is written in FORTRAN language for magnetohy- 
drodynamics (MHD) calculation with the adaptive mesh refinement (AMR) 
and Message Passing Interface (MPI) parallelization. There are several op- 
tional numerical schemes for computing the MHD part, namely, modified 
Mac Cormack Scheme (MMC), Lax-Fridrichs scheme (LF) and weighted es- 
sentially non-oscillatory (WENO) scheme. All of them are second order, two- 
step, component-wise schemes for hyperbolic conservative equations. The to- 
tal variation diminishing (TVD) limiters and approximate Riemann solvers 
are also equipped. A high resolution can be achieved by the hierarchical 
block-structured AMR mesh. We use the extended generalized Lagrange 
multiplier (EGLM) MHD equations to reduce the non-divergence free error 
produced by the scheme in the magnetic induction equation. The numerical 
algorithms for the non-ideal terms, e.g., the resistivity and the thermal con- 
duction, are also equipped in the MAP code. The details of the AMR and 
MPI algorithms are described in the paper. 
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1. Introduction 



The adaptive mesh refinement (AMR) algorithm was firstly proposed by 
Berger and Oliger [l[ and Berger and Colella jll , which can recursively create 
finer overlapping meshes to a given accuracy based on the Richardson error 
estimation. Using AMR, one can resolve a very small region in a very large 
scale simulation, for instance, the simulation of the propagation of solar wind 
from the Sun to the Earth, the protostellar collapse in the processes of star 
formation, the very thin current sheets in magnetic reconnection, and so on. 
There are two advantages comparing to uniform grid: AMR is very fast and 
effective to get a global high resolution; it can also refine any part of the 
computational box. 

The preliminary block structured AMR method [ij is not so easy to 
extend to multiprocessor calculation because the sizes of the blocks in the 
AMR hierarchical grid are different which generate difficulty in load balance 
between different processors, although the arbitrary shapes of the blocks can 
get a flexible and efficient memory usage. The data blocks, which are the 
subgrids of the base grid, allow to have arbitrary shapes and can merge 
with other blocks at the same level. In order to overcome this drawback, a 
simple approach was developed by DeZeeuw and Powell jsj. The basic idea 
is to build a hierachical binarytree for one dimension (ID), quadtree for two 
dimension (2D) and octree for three dimension (3D). These tree structures 
contain the necessary information between the parent blocks, child blocks and 
neighbor blocks, which can be used to update the block data from the finer 
blocks or exchange boundary data from the neighbor blocks. All blocks in 
all refinement levels have the exact same shape and can be easily parallelled 
using the Morton space filling curve (Z-curve [3|), Hilbert space filling curve 
(H-curve jsj), or other curves. 

There are many existing AMR codes including FLASH [6|, AMRVAC [7|, 
^ - i-i IiLrAMSES [l3,RIEMANNll3j, 

171 and so on. Some of them. 



PLUTO m, SFUMATO NIRVANA 10 



CRASH MJ, CHARM M, CASTRO 16 



e.g. FLASH and PLUTO, are using the AMR librarvhke PARAMESH [18 
toolkit, CHOMBO library (lij or BoxLib library |2oj. Other codes, how- 
ever, developed their own AMR algorithms for better performance. Our 
code, named as MAP (MHD code with adaptive mesh refinement and paral- 
lelization for astrophysics), was also designed with the same considerations. 
Besides, being easy to use and modify is another purpose. In the developing 
process, we learn much experience from these pioneers' papers, for instance. 
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the block structured idea from Berger and Colella [2| and DeZeeuw and Pow- 
ell the error estimation method from Ziegler [ij], the file management 
method from the code AMRVAC developed by Keppens et al. and CANS 
(Coordinated Astronomical Numerical Softwares). 

The major difference between our code and most of the existing codes 
is that we use the extended generalized Lagrange multiplier (EGLM) MHD 
equations which have one more variable and one more equation represent- 
ing the damping and transfer of the non-divergence free error as described 



by Dedner et al. j2l|]. The method of constrained transport (CT) [22| is 
not included in our MAP code, although it is almost perfect to control the 
divergence free condition to the machine round-off error. The reasons are: 
(1) it is complicated in an AMR code as it requires a staggered mesh; (2) 
additional variables have to be allocated for the cell center value of magnetic 
field, and the boundary data exchanging process is complex which takes much 
longer time than the unstraggered grids; (3) it requires a strict divergence 
free boundary condition, because CT can only guarantee the zero divergence 
condition from the old time to new time. When we use a rapidly-changing 
boundary, for instance, an emerging flux boundary, it will inevitably pro- 
duce non-divergence free magnetic field. In this case, the CT method may 
not be a good choice. There are several numerical schemes optionally for com- 
puting the EGLM-MHD equations, namely, modified Mac Cormack Scheme 
(MMC) |23|. Lax-Fridrichs (LF) |2J] and weighted essentially non-oscillatory 
(WENO) [25|. Since all of them are second-order, two-step, component- 
wise schemes for hyperbolic conservation laws, the code is fast, effective and 
lightweight. One can easily understand what is going on in the FORTRAN 
codes of the three schemes according to the formulae described in Section [31 
It is also convenient to modify the code or even add a new MHD solver. 
This is another feature of our code. The three schemes have their own spe- 
cial implements. The first two schemes, i.e. the MMC and LF schemes, 
are equipped with the total variation diminishing [TVD 26] limiters, while 
the latter two, i.e. the LF and WENO, are implemented with the approx- 
imate Riemann solvers. In MHD, the exact solver for Riemann problem is 
too complex and time consuming, thus only the simple solver like nonlinear 
Harten-Lax-van Leer Contact (HLLC) [27], Harten-Lax-van Leer Discontinu- 
ities (HLLD) jisi and Roe linear solver [29, 30] are adopted in the code MAP. 
There are several reasons for us not using higher-resolution numerical schemes 
(for instance, the piecewise parabolic method (PPM) by Colella j3l[ , Dai and 
Woodward [32] or corner transport upwind PPM (CTU-PPM) by Gardiner 
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and Stone [33, 34 1 or 5th order WENO by Jiang and Shu [3^, Jiang and Wu 
[25I): (1) the lack in accuracy can be compensated by the hierarchical block- 
structured AMR algorithm; (2) the accuracy of many parts of the equations 
like the thermal conduction and the resistivity, divergence cleanance method, 
as well as AMR reflux and interpolation method, can not always guarantee 
a high-order precision, which would lead to a poor global accuracy; (3) the 
high-order method is complex and time consuming. 

Our current MAP code is based only on the Cartesian coordinates, and 
the extension to cylindrical and spherical grids will be conducted in the next 
version. The other improvements like finite difference (FD) and finite vol- 
ume (FC) schemes, the radiation cooling for both optically thin and thick 
situations, and the relativistic MHD module are in consideration too. The 
AMR strategy for high-order FD and FV schemes are almost the same except 
the algorithm of boundary reflux between coarse and fine meshes, interpola- 
tion from parent blocks to child blocks and update parent blocks from child 
blocks at the new time. For a high-order FD scheme, if we use the same 
procedure as in FV, the scheme may lose some conservation and accuracy 
according to our practical experience. That is our additional reason for using 
only second-order schemes. 

The organization of this paper is as follows: Section [2] introduces the ba- 
sic EGLM-MHD equations for the divergence clean in our code; Section [3] 
describes the numerical schemes for solving the EGLM-MHD equations in- 
cluding the resistivity and thermal conduction; The detailed AMR algorithm 
with MPI is given in Section HJ Some numerical tests in ID, 2D and 3D are 
presented in Section |5l Finally, we give a summary in Section |6l 

2. EGLM-MHD equation 

The dimensionless MHD equations with gravity, resistivity and thermal 
conduction included are given in conservative form as follows: 

| + V-(pv) = 0, (1) 



^ +V- (^(^p+-i?^Jl + pvv-BBj =pg, (2) 
— + V ■ (vB - Bv) = -V X (r?V x B) , (3) 
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^+V- ( V ( e + + j9 ) - B (B • v) ) = -V-((r/V x B) x B)+V-(/tVT)+pg-v 



dt V V 2 

(4) 

here, eight independent conserved variables are the density (p), momentum 
{pVx, pVy, pVz), magnetic field {B^, By, B^), and total energy density (e). 
The expression of the total energy density is e = p/(7 — 1) + pv'^/2 + B'^/2. 
The pressure p and temperature T are dependent on the eight conserved 
variables, g is the gravity vector, and 77 the magnetic resistivity coefficient, 
and K is the thermal conductivity coefficient. It is noted that k is not always 
a constant. It may relate to the local temperature value with the form kqT^/'^ 
(here kq is another parameter for describing the conductivity). This thermal 
ffux may associate with the direction of the magnetic field, the direction of 
this thermal flux may change to (B ■ VT)B/i?^ in some cases. Finally, a 
unity matrix I is involved for matrix operations. The vacuum permeability 
is omitted in this dimensionless MHD equation. 

Almost all modern MHD codes have to face the problem of how to guar- 
antee the divergence free requirement. Because of the discretization and 
numerical errors, the performance of the MHD code can be unphysical (36| . 
There are several ways to maintain V ■ B = for MHD equations ([T]) - 
dl]): (1) 8-wave formulation [s^], (2) the CT method j22|, (3) the projection 
scheme jsoj . The 8-wave formulation can be easily implemented in a code by 
so-called "divergence source terms" without modifying the MHD solver. The 
numerical value of V ■ B can be controlled to a truncation error. The CT 
method is much robuster which can maintain the zero divergence condition 
to the machine round off error, but this method needs the staggered mesh 
which increase the difficulty in coding especially for AMR algorithm. The 
projection scheme introduces the Poisson equation to clean the numerical er- 
ror of V • B which canpreserve the conservative properties and the efficiency 
of the base scheme [38|]. Another way to keep divergence- free condition is 
to modify the MHD equations dl]) - dl]) by using vector potential A instead 



of the magnetic field B as it was used in Chen et al. [39[, or by using the 
vector magnetic potential or Euler poential. The advantage is that the diver- 
gence free condition is always satisfied, however, the MHD equation should 
be rewritten. 

Recently, Dedner et al. [21] proposed the extended generalized Lagrange 
multiplier (EGLM) formulation of the MHD equations. This method uses 
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two additional waves to transfer the numerical error of V ■ B. Thus the local 
divergence error can be damped and passed out of the computational domain. 
It can also tansfer and damp the numerical error coming from the bound- 
aries. As we described above, the rapidly-changing boundary will inevitably 
produce non-divergence free magnetic field. Actually, another method, the 
GLM-MHD system (i.e. mixed GLM scheme in Dedner's paper), is also 
proposed in Dedner's paper and Dedner et al. j2l| themselves recommend 
GLM-MHD system rather than the EGLM-MHD system. The EGLM-MHD 
equations ([5]) - ([9]) is the GLM-MHD system extended by additional terms 
which may lead to some conservation loss. However, in some of our ap- 
plications, we found that the GLM-MHD system works not so well as the 
EGLM-MHD system. Another reason, once we implement the EGLM-MHD, 
it is very easy to roll back to the GLM-MHD by disabling the additional 
source terms. Hence we adopted the EGLM-MHD equations of Dedner et al. 



2l\ in our simulations. The EGLM-MHD equations with resistivity, thermal 



conduction and gravity included are given as follows: 



^ + V-(pv) = 0, (5) 



^ ^^^^ + V ■ f + ) I + pvv - BB ) = - (V ■ B) B + pg , (6) 



dt V V 2 



r)B 

— + V ■ (vB - Bv + ^I) = -V X {r]V x B) , (7) 



B IB . 



-B-(V^/')-V-((r/V X B) X B)+V-(KVT)+pg-v 

(8) 



(9) 



where ip is a scalar potential propagating divergence error, Ch the wave speed. 



and Cp the damping rate of the wave [9|, |2l| . The other symbols have their 



normal meanings as in Eq. ([I]) - (jl]). As suggested in Dedner et al. [2l|, the 
expressions for Ch and Cp are 
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'^h = -T7 min(Ax, Ay, Az) 



(10) 




(11) 



where, At is the time step, Ax, Ay and Az are the space steps, Ccji is a 
safety coefficient less than 1. q G (0, 1) is a problem dependent coefficient 
to decide the damping rate for the waves of divergence errors. The value of 
Cd is 0.18 in most of our tests. We can see that Ch and Cp is not independent 
of the grid resolution and the scheme used. Hence we may have to adjust 
their values for different situations. 

3. Numerical schemes 

We use three different optional numerical schemes to solve the MHD 
part, i.e. MMC, LF and WENO as we mentioned above. In our calculations 
presented here, we mainly use the WENO scheme, the other two are used for 
comparison (see the accuracy test of these schemes in Section [5]). Although 
the accuracy of all three schemes is not so high in both space and time, the 
advantage is that the coding is simple and the running speed is fast. In 
this section, we briefly review the three schemes by using the following ID 
equation: 



here u{x,t) represents the eight conserved variables (p, pvx, pvy, pvz, B^, By, 
Bz, e) and we assume the grid in domain [a, b] is uniform and it is discretized 
into n points (for FD) or cells (for FV): 



the discretization of Eq. f lT2|) in finite difference form at the point i is as 
follows (using Ui = u{xi,tn) and fi = f {u{xi,tn)) for simplicity): 



du{x,t) ^ df {u{x,t)) 
dt dx 



0, 



(12) 



a < Xi < b ,i = 1,2,3,4, ...,n , 
Ax = Xi — Xi-i ,i = 2,3,4, ...,n , 



(13) 
(14) 



where f^^i is the approximate numerical flux. The notation n + 1 means 
the values of u at the new time. Other terms without the superscript n + 1 
are old values. The time step is At, ^ ± | means the half grid points. The 
discretization of Eq. f|T2|) in the finite volume form at the cell i is: 



where z ± ^ are the cell boundaries and the cell average is: 
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^^ = -^/ n(e,t)de, (17) 

Jx. 1 

where we use the cell average values to get f^^l for this equation. 

Whether FD schemes or FV schemes, the new value can be obtained by 
the following formula (if we omit the average bar on the variable u): 

As we mentioned above, the AMR algorithms for high-order FD schemes 
and high-order FV schemes are not exactly the same. The main differences 
include boundary reflux between coarse and fine meshes, interpolation from 
parent blocks to child blocks and updating parent blocks from child blocks at 
the new time. Note that the initial definitions of the cell average value and the 
point value are different. However, under our second-order approximation, 
there is no difference in the mathematic expressions between FV and FD and 
the point values are regarded as the cell average values in our MAP code. 

3.1. MMC scheme 

We briefly give the FD formulae here, for more information we refer the 



readers to the paper by Yu and Liu [23|]. The numerical flux (/j+i) needed 



in the Eq. (fTSjl is formed by using the Lax-Friedrichs splitting method with 
an additional higher order modified term: 

where 
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9-1 

' 2 



9 



r . 1 



i+4 



9.^ 



9.^ 



9.; 



i+4 



9,^1 

2 



Ui 



ft 



fi+l ft 



' 2 



(21) 
(22) 
(23) 
(24) 



The values of / and a are obtained from the value u. a and a is the local 
maximum eigenvalue of / and / (namely the maximum wave speed taken 
from the relevant range of u and u), respectively. The TVD flux limiter is 



taken as the form 40 



f r , 


r 


I 1 ^ 


r 



(25) 



3.2. LF scheme 



The FV formulae of LF scheme is given here, see Toth and Odstrcil [2J| for 
more information. The numerical flux {fi_^,l) needed by Eq. fllSp is obtained 
by the following flux (/„): 



u: 



(26) 



The simplest fm flux is Lax-Friedrichs flux but our MAP code also im- 
plemented approximate Riemann solvers like HLLC 27|, HLLD 28| and Roe 
solver 29|, |30| . Since the formulae of these Riemann solvers are too compli- 
cated to be given here, thus we only list the Lax-Friedrichs flux as 



/„(a,6) = ^(/(a) + /(6)-a(6-a)) , 
where a is the local maximum speed. Other variables are given as: 

At 



Ui 



2Ax 



(27) 



(28) 
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=^i + ^(p (^Am^.i , Am.+ i) , M+i = Mi - ^0 (^Am^.i , Am.+ i) , (30) 



Am^.^.! = Ui+i - Ui , (31) 

where is a TVD slope limiter. Note that the slopes in equations ( 129|) 
and f l30ll are calculated from the same variable differences Au^^i for better 



performance [2J]. The form of the limiter used in our code is 



(a, 6) = minmod (a, 6) = sgn (a) max (0, min (|a|, sgn (a) 6)) , (32) 
where sgn (a) stands for the sign of value a. 

3.3. WENO scheme 

As for the component- wise FV WENO scheme, in order to form Eq. (fT8|) . 
we can also use the same Lax-Friedrichs flux Eq. (1271) or Riemann solvers 
like HLLC, HLLD and Roe solvers in 



where the reconstructions of values u. i are 

*+2 



2a,- , iw. , 1 + a,-_iv. 1 a,- , 1 w+ 1 + 2a,_if 1 



where 



= i (wi + Ui+i) , v._^ = ^ {3ui - Ui-i) , (35) 
w+ 1 = ^ (3Mi - Ui+i) , f + 1 = ^ (wi + Mi-i) , (36) 
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'^i+i = ^ • (37) 

[e + {Ui+i - Ui) ) 

where the constant e = 10~^^ in our code, which can avoid the denomi- 
nator becoming zero. Following the formulae listed above, we can obtain 
the value of u at the new time. However, the time accuracy is only the first- 
order. It is recommended to use the optimal second-order TVD Runge-Kutta 



method j4l| instead of formula f lTS]) . i.e.. 



(38) 



2 



Note that uj is an intermediate variable and the numerical flux f^, i is 
taken from uj. 

3.4- Thermal conduction 

Thermal conduction is added into the MAP code by an explicit scheme. 
Generally speaking, once we consider the thermal conduction, the safety time 
step become very small. It may take a long wall time to get the result if the 
code evolves the whole EGLM-MHD part with such a small time step. Thus 
we set a subcycle for treating the thermal conduction. The subcycle means 
that the code only computes the thermal conduction part of energy equation 
many times with the time step determined by the speed of thermal conduction 
within an EGLM-MHD time step, as shown in the schematic chart (Fig. [1]). 
The equation of the thermal conduction for integration of subcycle is written 
as 

§ = V ■ (kVT) . (40) 

Assuming the grid is uniform, the one dimensional numerical scheme for 
this equation is given as 



e^+^ = ei + {^/KiKi+i (Ti+i - Ti) - .Jk~k~{ {Ti - T^.i)) . (41) 
Ax 

The time step used in the subcycle is given by 
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Reg rid 



Framework synchronization 




j Program End | 



Figure 1: Schematic chart of our MAP code. The AMR algorithm includes the regriding, 
framework synchronization, load balance and neighbor statistics procedures. The time 
step in the subcycle is controlled by actual physical problem, e.g., the thermal conduction. 
The boundary condition of the subcycle is treated only for the specific physical quantity, 
i.e., the total energy. The boundary conditions contain the refluxing, data exchange and 
boundary fixing procedures. ■'"^ 



At = c, 



(min(Ax, Ay, Az))'^ 



(42) 



■cfl 



Udim max{nT/p) 



where Udim is the number of dimensions. The subcycle is simple and fast 
but one should be careful to deal with it. Since the other variables do not 
change with time during the subcycle, in our experience it is better to reduce 
the number of cycles to dozens or less to get the reliable results. Moreover, 
there should be a threshold for the temperature gradient in Eq. (1401) . because 
for any matter the speed of thermal conduction can not be infinite. That 
is a problem dependent parameter. So far, only the thermal conduction is 
calculated by the subcycle. If necessary, it is easy to add other physical 
process into the subcycle. 

3. 5. Resistivity 

As for the resistivity terms 'R.{Rx-, Ry, Rz) = —V x (r^V x B) and Rg = 
— V ■ ((77V X B) X B) in the induction equations ([7]) and ([HD, we use the 
simple central finite difference scheme to compute the current density first 
{J{Jx, Jy, Jz) = ^ ^ B), and then the resistivity terms, i.e.: 




X )i,j,k 




{Bz)i+l,j,k — {Bz)i-l,j,k 



2Az 



(43) 



2Az 



2Ax 




(-Bj,)j+l,j,fc ~ {By)i_l^jk 




2Ax 




{vJz)i,j+l,k — {vJz)i,j~l,k 

2A^ 

{vJx)i,j,k+l — {^Jx)i,j,k-1 

2Ai 

{Vjy)i+l,j,k — {f]Jy)i-l,j,k 

2Ax 



(44) 




{vJx)i,j+l,k — {vJx)i,j-l,k 

2A^ 



13 



2Ax 

{7]{J^B, - J^^x))^j+i,fc - {yjJxB, - JzB^)). ._^ f^ 

2Ay 

{ri{JyB.^ - JxBy))- -,^^-^ - {ri{JyB^ - JxBy))i -j^_^ 

2Ai 



where Rx-, Ry and Rz are for the induction equation ((Tj) and Re for the energy 
Eq. (IH]). The resistivity model can be modified to any form according to the 
user's need. 

3. 6. Damping zone 

The damping zone is a range where the MHD waves and matter motions 
can be damped to the initial condition. It is a kind of boundary condition 
because it is usually adjacent to the physical boundary. The goal of the 
zone is to stablize the boundary and to remove the non-physical infiows and 
outfiows produced by the numerical error. Given a start position {Dg) and 
an end position {D(.) of the damping zone, the one dimensional damping 
function D{x) is 



where uq is the initial value of the variable u. Fig. [2] shows the function of 
D{x) with the situations > D^. and Dg < D^., which correspond to the 
different damping directions, namely, physical waves and motions will vanish 
at the lower boundary for Ds > -De and at the higher boundary Ds < De. It 
is the same way to treat the damping zone in multiple dimensions. One can 
do it dimension by dimension or treat x, y and z directions together. The 
damping function only has effect on the density (p), velocities {v^^Vy^Vz) 
and pressure (p), but not to the magnetic field. Because the modification of 
magnetic field will change the magnetic topological configuration which may 
destroy the divergence free condition leading to some non-physical results. It 
is noted that the damping zone can also produce some weak reflected waves. 
That is inevitable for most of the boundary conditions and these reflected 
waves have little effect on the results in our tests. 





(47) 
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Damping Function (D^ > DJ Damping Function (D^ < DJ 




Figure 2: Damping function D{x). Left panel shows the situation of Dg > where the 
left bounday is damped, while right panel shows the case with Dg < where the right 
bounday is damped. 



3.7. Multiple spatial dimensions 

All we discussed above is in ID space. It is easy to extend the schemes to 
two or three dimensions. However, the method in extending the numerical 
scheme to multiple dimensions depends on the schemes. We treat the WENO 
scheme dimension by dimension in 2D and 3D. For the 2D case, the numerical 
flux in x-direction is calculated first and then the fiux in y-direction by using 
the same variables at the old time. After that, the variables are updated to 
the new time by taking the numerical fiux in x- and y- directions simulta- 
neously. If the TVD Runge-Kutta method is available, the variables need to 
be calculated twice to update all variables. For TVD-MMC and TVD-LF 
methods, the dimension by dimension method is not necessary, since we can 
do all directions together. The formulae for 2D or 3D are easy to derive 
following the procedure in Sections 13.11 and 13. 2[ The time step in multiple 
dimensional problems are shown as follows: 

min(Ax, Ay, Az) 
At = Ccfi , (48) 

the ndim is the number of the dimensions, i.e. ndim = 1 for ID, ndim = 2 
for 2D and Udim = 3 for 3D and ag is the global maximal wave speed. In 
Section [5l we show several 2D and 3D tests in some special test problems. 
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4. AMR parallelization strategy 

The schematic chart (Fig. [1]) shows the main process in the MAP code. 
The AMR parallehzation is more difficuh than the usual Eulerian meshes. 
The main difficuhs are: (1) how to arrange the structure for AMR hierar- 
chical blocks? (2) how to guarantee the load balance when new blocks are 
generated? (3) how to apply the boundary conditions between different pro- 
cessors and between different refinement levels? We use the two dimensional 
MAP code to explain what we did in the AMR algorithm for simplicity and 
intuition. The aim of this section is to give a detailed explanation of the MAP 
code. Another purpose is to let the readers who have no idea about the AMR 
to know what is AMR and how to write an AMR code by themselves. MAP 
code implements the MPI parallelization which supports the MPICH2 and 
OpenMPI softwave, which are full MPI-2 standards. The other paralleliza- 
tion softwares like OpenMP which requires a shared-memory computer are 
not supported in our code. 

4.1. Hierarchical structure 

Supposing that we have a 2D computational domain with the total mesh 
cells of 8 X 8, the domain initially is divided into 4 subdomains, labeled by 1, 
2, 3, 4. Every block has the same cells 4x4 and are surrounded by physical 
boundaries in gray and inner boundaries in yellow as shown in the upper left 
panel of Fig. [3] (note that the inner boundaries exist only between blocks). 
The numerical scheme discussed in Section [3] will solve the four blocks in 
turn. Then we can get the solution at the new time by gathering the data 
from all blocks. 

Assuming that blocks 1 and 3 are in processor 0, blocks 2 and 4 in proces- 
sor 1 and block 1 has already been refined as shown by the upper middle and 
upper right panels of Fig. |3l The number of cells in blocks 5, 6, 7, 8 is the 
same as the block 1, i.e. 4x4, which means that the amount of calculation 
of the blocks 5, 6, 7, 8 is the same as the blocks 1, 2, 3, 4. All blocks are 
connected by the so-called link list. Every process includes two kind of link 
lists, one for collecting information in the local processor and the other one 
for global information, i.e. (1) global link list and (2) local link list. The 
local one refers to the links in the local processor while the global one links 
all blocks which exist in all processors as shown in Fig. |H That is, the lo- 
cal ones are totally different in individual processors but the global ones are 
exactly the same. Actually, the global link is enough to complete the AMR 
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Figure 3: The AMR hierarchical structure. Upper left panel: The computational domain 
has four blocks with cells 4x4 and the regions in gray and yellow are physical and inner 
boundaries. Only the inner boundaries of block 1 have been plotted in this panel. Note 
that the inner boundaries only exist between blocks. Upper middle panel: Blocks treated 
in processor are in blue and those in processor 1 are in green. Block 1 has been refined 
to blocks 5, 6, 7 and 8. The inner boundaries have been omitted in this panel. Upper 
right panel: Another viewing angle for the middle panel without the physical and inner 
boundaries. The dashed lines in three panels are the H-curves. Lower left panel: the child 
points of block 1 point to blocks 5,6,7 and 8. Lower middle panel: the parent pointers 
of blocks 5, 6, 7 and 8 point to block 1. Lower right panel: The eight neigh pointers of 
block 8 point to blocks 5, 6, 2, 2, 4, 3, 3 and 7, respectively. 
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algorithm, because one can judge which block in the link is located in the 
local processor. However, as we know, the global link sometimes is very long 
and may include several hundred thousand blocks, so the frequently used 
searching operations may take a long time. The order of these two links are 
arranged by the Hilbert space filling curve (H-curve) which is shown by the 
dashed lines in the upper panels of Fig. [31 The H-curve maps multidimen- 
sional data to one dimension while preserving locality of the data points. The 
order of the H-curve in the finer level blocks is determined by the parameters 
{s-pos, ejpos, sjpoint in Tabled]) of the H-curve in their parent blocks. 
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Figure 4: The global links (dashed arrows) and local links (solid arrows) between different 
blocks and different processors. 

Every block, which is a STRUCTURE type variable in FORTRAN lan- 
guage, includes some information and data. The contents of information are 
listed in Table [U Some examples: the Iv of block 8 is 2 and the processor _id 
of it is 0. The id of block 8 is {{1, 1}, {2, 2}} and block 5 is {{1, 1}, {1, 1}}. 
The child pointers of block 1 point to blocks 5, 6, 7 and 8 (lower left panel of 
Fig. E]), respectively. The parent pointers of blocks 5, 6, 7, 8 point to block 
1 (lower middle panel of Fig. [3]). The eight neigh pointers of block 8 point 
to blocks 5, 6, 2, 2, 4, 3, 3, 7 (lower right panel of Fig. |3]), respectively. The 
neigh pointers have some identical values because, for instance, the right and 
lower right neighbors are the same. 

4-2. Load balance 

Before introducing what is the load balance, we first introduce the def- 
inition of framework^ which is very important to find a simple way to do 
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Table 1: The block information. 



Variable 



Interpretation 



Iv Level of current block. 

processor _id Processor rank the current block belong to. 

nx, ny, nz Grid points or cells in x, y, z-direction of the current block. 

x,y,z X, y, z-coordinates of the current block. 

hjnumher Position of the current block in the Hilbert curve. 



sjpos, ejpos, sjpoint 
id 



Used to generate the Hilbert curve in the finer level 

(array). 

Position of the current block in the total 
AMR hierarchical levels (array). 



parent 
child 

neigh 

next 
frameworkjnext 



Link to the parent block of the current block (pointer). 
Link to the children blocks of the current block, 

if no point to NULL (pointer array). 
Link to the neighbor blocks of the current block, 
if no point to NULL (pointer array). 
Link to the next block in the local link list (pointer). 
Link to the next block in the global link list (pointer). 



the load balance. Just as its name implies, framework is the framework of 
the AMR hierarchical structure. It is the global link list plus the all blocks' 
information listed by Table [1] Like the global link list, the framework in all 
processors should be exactly the same. In this case, block 8 in the processor 
knows which is its right neighbor block located in the processor 1 as shown 
in Fig. [3l The processors have the same frameworks, but they need not 
store all the variables of the blocks which belong to other processors. That is 
to say the blocks which belong to other processor have only the information 
listed by Table 1. In Fig. HI the blocks in blue and green colors denote that 
only these blocks have allocated the memory for variables. It is noted that 
the blocks without the variable data can still be refined or destroyed. The 
child blocks also have no variable data. And the memory required to store 
the entire global link list is very small. 

The framework structure has some advantages: (1) load balance can 
be done all at once rather than iteratively. That is to say, all blocks can 
be refined or destroyed to a suitable level according to the regrid algorithm. 
However, there is a special situation we have to carry on the regriding and 
load balance level by level (iteratively). When only a very small region has 
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to refine to a very high level, while this region locates in only one processor, 
then the code has to allocate a huge number of memory in one processor. If 
the memory in one node is not enough, then the code crashes. Therefore, 
in our MAP code we can choose whether carrying on the regrid and load 
balance processes iteratively or not. (2) The data exchanging is simple. 
When carrying on the load balance, for instance. Processor 1 has to send the 
data of block A to processor 2, the processor 1 only needs to send the data of 
variables to the processor 2. And the block A (it already exists in processor 
2 but without variable data) in processor 2 will be filled by this data. It 
needs not to send the neighbors information to processor 2 simultaneously. 
Because of the so-called framework, processors know the number of blocks 
in different processors and the total number of blocks in all processors. Thus, 
the code can calculate the load in different processors and balance the load by 
sending the block data from one processor to other one. The load is measured 
by the number of blocks which will be solved using the schemes described in 
Section [31 For instance, the block 3 in the upper middle panel of Fig. |3]will 
to sent to processor 1 by MPI calls because there are five blocks needed to 
be calculated in the processor (the block 1 can be updated by the average 
values of its child blocks) and two blocks in the processor 1. This is not a 
rigorous balance because of the number of blocks which will be solved, the 
parent blocks (e.g. the block 1) which will be updated, and the boundary 
data which will be exchanged are different. However, this difference is very 
small when the total number of blocks is much greater than the number of 
processors. 

After the load balance is done, the code has to do another important 
job: neighbor statistics. This is a preparation for the boundary conditions 
between different processors in the next several or dozens of time steps as 
shown in the schematic chart (Fig. [T]). In this procedure, the code will record 
the number and the id of blocks which need to reflux, the blocks which need 
to send data from fine blocks to coarse blocks, and the blocks which need to 
send data to the blocks at the same level. Once the number of the blocks is 
known, the code can allocate enough memory to store the data which will 
be sent or received in advance. When the block id is known, the block will 
be found immediately and store its data to the allocated array or update its 
data from the allocated array. This can reduce the operations of allocating 
and freeing memory, so that the performance of the code can be greatly 
improved. 
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4-3. Boundary condition 

The neighbor statistics can provide enough information for processing the 
boundary conditions quickly. As we described above, for boundary conditions 
we need to do: (1) data updating from the child blocks to the parent blocks; 
(2) reflux; (3) data exchange between blocks in the same level; (4) data ex- 
change between different levels. We have several steps to do these. Every 
step has two small sub-steps: inter-processor first and then intra-processor. 
The inter-processor means the boundary conditions between different pro- 
cessors and the intra-processor stands for the boundary conditions inside one 
processor. The procedure is: (1) to update the parent data. In this step, we 
should care about shareblocks in the inter-processor case. The shareblock 
was firstly introduced by lH, which refers to the child blocks of one block 



located in different processors. Suppose that block 8 in Fig. |3] belongs to the 
processor 1 while its parent block belongs to the processor 0, so the block 8 is 
called the shareblock. The shareblock has to send all of its data (without the 
ghost region) to the corresponding position in the global link list of processor 
in order to update the data of block 1; (2) to reflux the data between the 
fine-coarse boundaries as shown by the upper right panel of Fig. |3l Because 
the data of the coarse block 1 is updated by the fine blocks 5, 6, 7, 8 and 
the flux between the blocks 1 and 2 is changed, the block 2 has to use the 
new numerical flux to update its boundary data as suggested by [l| and jij; 
(3) to exchange the x-direction data in the same level; (4) to exchange the 
l/-direction data in the same level; (5) to exchange the 2;-direction data in 
the same level. The MAP code does not need extra requirement of MPI 
calls for the corner data (as shown by Fig. E]). The steps (3) - (5) can be 
explained by the left panel of Fig. O In this figure, the blue dashed ghost 
region will be updated by the data of the blue dashed box in the block 2; 
Then, the ghost region of the block 3 will be updated by the data of the red 
dashed box (already included the corner data) in the block 1. Of course, the 
steps (3) - (5) also treat the physical boundary conditions at the same time; 
(6) to update the ghost region of the fine block by interpolating the data 
of the coarse block at the fine-coarse boundaries. The only important thing 
here is that we should use a conservative interpolation. The formula is taken 
from [oj: 
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Figure 5: Left panel shows the data exchange at the same level. In the panel, we exchange 
the data in the x-direction first and then the y-direction. The corner data are also updated. 
The right panel shows a special situation in which the corner data can not be updated 
correctly. 
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where the fine block (/) is a child of the coarse block (c), the grids index 
{i"^, j'^, k'^) is for the coarse block and the coordinate {x,y,z) is defined at 
the center of the cell. The function of minmod is given by Eq. (132|) : (7) to 
fix the corner data. The steps (1) - (6) are successful to exchange almost 
all boundary data except one special situation shown by the right panel of 
Fig. El Only the lower left neighbor of the special block (S) is a coarse block. 
Before the fine-coarse step (6), the non- updated corner data of the normal 
block (A^) are sent to the block (S) since the steps (3) - (5) is the advance 
step (6). Thus the corner data of the special block (S) need a fix step. In 
the MAP code, we just look for these so-called S blocks and do the steps (4) 
- (5) again. 
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4.4- Framework synchronization 

As we mentioned above, every processor knows the total information of 
the AMR hierarchical structure, it is possible to accomplish the load balance 
and the boundary conditions quickly and effectively. However, the next ques- 
tion is how to keep the framework systems synchronous for every processor. 

The framework synchronization is something like the cloning technique. 
The processors can generate the same framework by the same gene. The 
local sequence of gene is built by recording the id of every block which will 
be regrided (including refining and destroying procedures) in a regridding 
operation in the current processor. The global sequence of gene is obtained by 
merging all the local sequences. The MPI communication of mpijallgatherv 
is necessary for such a merging. The criterion for refining or destroying will 
be discussed in Section 14.51 With the assumption that we have the same 
framework in the current state, we can get the same gene after regriding 
and then the code should generate the exactly same framework at the next 
time step. 

4-5. Regrid 



The error estimation formula is taken from Matsumoto [9|, Ziegler [11 



which is determined by the first and second derivatives in the current level: 
V \u\+e \\Au\\2 + F ■ {\u\ + e) J ' ^ ^ 



lAiil 



V3 



-l,j,k) + {Ui,j+l,k — Ui,j-l,k) + 



— Ui,j,k-l) ) 

(51) 



2n1/2 



||A U\\2 = {{Ui+ij^k - '^Ui,j,k + Ui-l,j,k) + 

{ui,j+i,k - '^Uij^k + Ui,j~i,kf + (52) 

where the value of the filter F is 0.05, R is the refinement ratio and L the 
level of the current block, and e is taken as 10~^^ for p,p, T and 0.1 for v, B. 
The parameter B = 0.6 specifies a bias toward the first derivatives for the 
criterion. The value ^ makes the threshold for higher levels higher < 0) or 
lower (,^ > 0). When the error exceeds the given threshold values, the current 
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block will be refined. If the errors estimated in every cell in the current block 
are larger than the threshold values, the child block of the current block can 
be destroyed. We also try to use the Richardson error estimation, however, 
this method usually takes a long time and is not so effective. For this reason, 
the Richardson error estimation is not included in the MAP code. 

After several time steps the regridding operation will be carried on. The 
estimation interval for the next regridding is 

_ ^ min(Ax, Ay, Az) _ ^ 2ngndimAt ^^^^ 

ag ^ Ccfl 

where Atr is the time interval for the next regridding and At is the time 
step of the simulation, which is taken from the finest level, and ag is the 
global maximum wave speed. The different time steps for different levels will 
be achieved in the future work. Since the code checks the errors for every 
cell covering the ghost region so the length of buffer region in which the 
waves or discontinuities do not propagate to the coarse block before the next 
regridding time is 2ng (Fig. E]), where is the number of ghost cells. 



Figure 6: The blocks with blue borders show the coarse blocks and the red ones show the 
fine blocks. Assuming that the ghost cell is 1, the buffer zone for the next regridding is 
the shadowy cells in the fine blocks since the MAP code checks every cell in the current 
block. 



5. Numerical tests 

Our MAP code is examined with a variety of test and application prob- 
lems in this section. The ID problems test the difference between different 
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schemes and the accuracy of them. The 2D and 3D problems check the AMR 
parallel performance of our code. In all our test cases, we use the primitive 
variables to describe the initial conditions. 



5.1. ID accuracy test 

We test the accuracy of the three schemes, namely, MMC, LF and WENO 
by simplest advection problem: {p,Vx,Vy,Vx, B^, By, Bz,p) = ((sin(27ra;) + 
2)/3, 1, 0, 0, 0, 0, 0, 1) in the domain [—0.5,0.5] with the periodic boundary. 
Lax-Friedrichs splitting for MMC and Lax-Friedrichs flux for LF and WENO 
are used in this test. The adiabatic index is 7 = 1.4. The L^, LF' and L°° 
errors are listed in Table HI which are defined as: 



L'i^) = ^^l\<-U^\ , (54) 

i=l 




where U is the exact solution and N the grid number. From Table [21 it can 
be seen that these schemes are second order accuracy and WENO scheme is 
better than the other two. This is why we mainly use WENO for most of 
our 2D and 3D tests. 



5.2. ID Brio-Wu shocktube test 



This test is taken from [30|, which is a standard 1.5D (one dimensional 
multi components) MHD shocktube problem. The initial conditions of the 
left (x < 0) and right (x > 0) states are {p,Vx,Vy,Vx, B^, By, Bz,p)l = 
(1, 0, 0, 0, 0.75, 1, 0, 1) and (p, v,, Vy, v,, B^, By, B„ p)r = (0.125, 0, 0, 0, 0.75, -1, 0, 0.1) 
with the adiabatic index 7 = 2. The length of computational domain is 1 
and X e [—0.5, 0.5]. The upper panels of Fig. [7] show the comparison between 
the three kinds of schemes without the approximate Riemann solvers (Lax- 
Friedrichs splitting for MMC and Lax-Friedrichs flux for LF and WENO) 
while the lower panels show the same scheme (WENO) but with different 
Riemann solvers. From this figure, we found that the WENO and LF schemes 
are much better than the MMC scheme (upper panels). Although the MMC 
scheme has the same accuracy as the LF scheme, the former is not suitable 
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Table 2: L-^, and L°° error and order in ID accuracy test. 



Scheme 


N 






order 


error 




order 


L°° error 




order 




128 


5.47 X 10- 


-3 


— 


7.27 X 10- 


-3 


— 


1.74 X 10" 


-2 


— 




256 


1.55 X 10- 


-3 


1.82 


2.34 X 10- 


-3 


1.64 


7.18 X 10- 


-3 


1.27 


MMC 


384 


7.37 X 10- 


-4 


1.84 


1.20 X 10- 


-3 


1.65 


4.25 X 10- 


-3 


1.30 




512 


4.32 X 10" 


-4 


1.85 


7.42 X 10- 


-4 


1.66 


2.91 X 10- 


-3 


1.31 




640 


2.84 X 10" 


-4 


1.88 


5.12 X 10- 


-4 


1.66 


2.17 X 10- 


-3 


1.31 




128 


4.64 X 10- 


-3 




6.32 X 10- 


-3 




1.72 X 10- 


^2 






256 


1.31 X 10- 


-3 


1.82 


2.03 X 10- 


-3 


1.64 


7.12 X 10- 


-3 


1.27 


LF 


384 


6.20 X 10- 


-4 


1.85 


1.04 X 10- 


-3 


1.65 


4.22 X 10- 


3 


1.29 




512 


3.62 X 10" 


-4 


1.87 


6.44 X 10- 


-4 


1.66 


2.90 X 10" 


-3 


1.30 




640 


2.38 X 10- 


-4 


1.89 


4.45 X 10- 


-4 


1.66 


2.17 X 10- 


-3 


1.31 




128 


2.99 X 10- 


-3 




4.62 X 10- 


-3 




1.24 X 10- 


-2 






256 


7.04 X 10- 


-4 


2.09 


1.35 X 10- 


-3 


1.77 


4.68 X 10- 


-3 


1.41 


WENO 


384 


2.96 X 10- 


-4 


2.14 


6.54 X 10- 


'4 


1.79 


2.61 X 10- 


-3 


1.44 




512 


1.59 X 10- 


-4 


2.16 


3.89 X 10- 


-4 


1.81 


1.73 X 10" 


-3 


1.44 




640 


9.79 X 10- 


-5 


2.17 


2.60 X 10- 


-4 


1.81 


1.25 X 10- 


-3 


1.46 



to treat the MHD problem. HLLD and Roe Riemann solvers are more ac- 
curate than HLLC (lower panels). However, in our multi-dimensional tests, 
the HLLD and Roe solvers fail sometimes, thus we mainly use HLLC for our 
2D and 3D test problems. 

5.3. 2D accuracy test 

As the ID advection test, we also test the convergence of the three schemes 
in 2D advection problem: {p,Vx,Vy,Vx, B^, By, Bz,p) = ((sin(27r(a; + y/-\/3) + 
pz/2)+2)/3, 1, 0, 0, 0, 0, 1) in the domain [-0.5, 0.5] x [- V^/2, ^3/2] with 
the periodic boundary. The velocity has angle of 7r/6 with the y-axis. Lax- 
Friedrichs splitting for MMC and Lax-Friedrichs flux for LF and WENO are 
used in this test. The adiabatic index 7 = 1.4. The L^, and L°° errors 
are listed in Table |3l As shown by this table, we got almost the same results 
with the ID test. Moreover, we found that the extended 3D advection test 
also given the similar result, which is no longer necessary to discuss in this 
paper. 
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Figure 7: Coniparsion between the modified Mac Cormack Scheme (MMC), Lax-Fridrichs 
scheme (LF), and weighted essentially non-oscillatory (WENO) scheme in the 1.5D MHD 
shocktube problem. The figure shows the range from —0.4 to 0.4 at time 0.10. The 
solid line is a reference solution with 4000 cells, while the MMC, LF, WENO schemes are 
computed with 800 cells. The lower panels are calculated based on the WENO scheme 
with different approximate Riemann solvers, i.e. HLLC, HLLD and Roe solver. The right 
panel is a zoom-in view of the dashed box in the left panel. 
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Table 3: L-^, and L°° error and order in 2D accuracy test. 



Scheme 


N 






order 


error 




order 


L°° error 


order 




64 


1.89 X 10- 


-2 


— 


2.30 X 10- 


-2 


— 


4.82 X 10-2 


— 




128 


6.35 X 10- 


-3 


1.59 


8.34 X 10- 


-3 


1.48 


2.09 X 10^2 


1.22 


MMC 


192 


2.95 X 10- 


-3 


1.90 


4.32 X 10- 


-3 


1.63 


1.26 X 10-2 


1.25 




256 


1.76 X 10- 


-3 


1.80 


2.70 X 10- 


-3 


1.65 


8.75 X 10-3 


1.27 




320 


1.16 X 10- 


-3 


1.90 


1.86 X 10- 


-3 


1.67 


6.58 X 10-3 


1.29 




64 


2.07 X 10- 


-2 




2.49 X 10- 


-2 




5.16 X 10-2 






128 


6.99 X 10- 


-3 


1.58 


9.09 X 10- 


-3 


1.47 


2.25 X 10-2 


1.21 


LF 


192 


3.35 X 10- 


-3 


1.83 


4.67 X 10- 


-3 


1.65 


1.35 X 10-2 


1.25 




256 


1.96 X 10- 


-3 


1.87 


2.91 X 10- 


-3 


1.64 


9.38 X 10-3 


1.28 




320 


1.29 X 10- 


-3 


1.87 


2.01 X 10- 


-3 


1.66 


7.04 X 10-3 


1.29 




64 


1.78 X 10- 


-2 




2.09 X 10- 


-2 




4.14 X 10-2 






128 


4.39 X 10- 


-3 


2.05 


6.40 X 10- 


-3 


1.72 


1.63 X 10-2 


1.36 


WENO 


192 


1.90 X 10- 


-3 


2.07 


3.15 X 10- 


-3 


1.76 


9.26 X 10-3 


1.40 




256 


1.05 X 10- 


-3 


2.08 


1.89 X 10- 


-3 


1.78 


6.17 X 10-3 


1.42 




320 


6.64 X 10- 


-4 


2.06 


1.27 X 10- 


-3 


1.79 


4.49 X 10-3 


1.43 



5.4- 2D parallelization efficiency test 

A simple MHD 2D test is a blast wave. The problem can generate several 
shocks from the central high gas pressure zone, it is very suitable to test the 
AMR algorithm and the parallel efficiency. The density (p) and magnetic 
field {B^, By, Bz) are uniform in the domain [—0.5, 0.5] x [—0.5, 0.5] with the 
value 1 and (l/-\/2, l/v^, 0), respectively. A small hot area is located at the 
center of the domain, i.e. p = 10 within the circle x"^ + y'^ < 0.l2, whereas 
p = 1 out of this circle. Adiabatic index is 7 = 5/3. The left panel of 
Fig. [8] shows the pressure distribution at time 0.16 and the right one shows 
the AMR mesh with the refinement level L = 5. Since the base resolution 
is 256 X 256, the effective resolution is 4096 x 4096. Fig. |H] corresponds to a 
8-processor run, the parallelization efficiency for other number of processors 
are listed in Table HJ where Np is the number of processors. Blocks indicates 
the total number of blocks in all processors at time 0.16. The efficiencies 
higher than 1 may due to the speed fluctuation of the supercomputer. 

5. 5. 2D divergence free test 



The test was introduced by [33| to check the conservation of the divergence- 



free condition. If we had not taken this condition into account, the code 
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Table 4: Parallelization efficiency for Blast wave test (AMR level L = 5). 





Blocks 


Time 


Efficiency 


1 


15512 


88504 s 


1 


2 


15512 


43595 s 


1.02 


4 


15512 


21941 s 


1.01 


8 


15512 


11327 s 


0.98 


16 


15512 


5731 s 


0.97 


32 


15512 


2908 s 


0.95 


64 


15504 


1521 s 


0.90 


128 


15504 


831 s 


0.83 



Time = 0.16 Time=0.16 




-0.4 -0.2 0.0 0.2 0.4 -0.4 -0.2 0.0 0.2 0.4 

X X 



Figure 8: Left panel is the gas pressure distribution with the velocity arrows and magnetic 
field lines at time 0.16. The right panel shows the AMR mesh with the refinement level 5. 
The eight colors in the right panel stand for the blocks occupied by the eight processors. 
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may give us non-physical results in some MHD applications. The problem 
domain is [—0.5,0.5] x [—0.5,0.5], with the uniform density (p = 1) and ve- 
locity {vx = 1, Vy = 1, Vz = 0). The magnetic configuration is given by 
Bx = —Boy/r, By = Bqx/t and B^ = in the region r = -^/x^ + < 0.2. 
In this region, we set p = 1 — -Bg/2 for magnetostatic equilibrium and p = 1 
for outside of this region. Adiabatic index 7 = 5/3 and Bq = 1 x 10~^. The 
results are displayed in Fig. [HI The magnetic loops advect across the bound- 
ary twice when the dimensionless time is 2. As shown in Fig. |9l the magnetic 
loops without divergence cleanance are distorted {middle panel). However, 
when the divergence cleanance is conducted, the loops keep their original 
shapes [right panel). The V ■ B errors are plotted with the function of time 
in the Fig. [TUJ In this figure, we show the performances of the three schemes. 
As expected, the methods using EGLM-MHD equations can maintain the er- 
ror one order of magnitude smaller than the methods using pure MHD. The 
method with higher resolution also given less V ■ B error. Although it is not 
so perfect as the CT schemes, EGLM-MHD has its advantages: (1) it is easy 
to be equipped in the existed MHD codes; (2) it can damp the non- divergence 
error produced by the rapidly- changing boundaries. 



Time=0.00 Time=2.00 Time=2.00 




Figure 9: Total magnetic pressure distributions with the magnetic field lines of diver- 
gence free test. Left panel: The initial condition; Middle panel: advection result at time 
2 without divergence cleanance method; Right panel: advection result at time 2 using 
EGLM-MHD with a coefficient Cd = 0.18 as described in Section O This simulation is 
carried on by using the WENO scheme with the Lax-Friedrichs flux. The resolution in 
this test is 256 x 256, no AMR involved. 
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Figure 10: The V • B average errors with a function of time for different methods and 
resolutions (the numbers 256 and 512 mean the resolutions of 256 x 256 and 512 x 512). 
Left panel: the MMC scheme; Middle panel: LF scheme; Right panel: WENO scheme. 
Lax-Friedrichs splitting for MMC and Lax-Friedrichs flux for LF and WENO are used in 
this test. 



5.6. 2D magnetic rotor 

This test suite is introduced by Balsara and Daniel 42| , which is used to 
test the propagation of strong torsional Alfven waves. The rotating disk in 
the computational center generates shocks and waves from the strong shear- 
ing surface between the rotating disk and the static ambient fluid. We use 
the initial conditions as described by Toth [sS*] which involves a higher initial 
velocities. It is better to test the robustness of our code. The computational 
domain is [—0.5,0.5] x [—0.5,0.5], the initial distributions are: 



p = < 1 + 9/ for ro < r < ri , (57) 




for r < ro 

for tq < r < Ti , (58) 
for r > ri 

for r < Tq 

Vy = ■^^ 2fx/r for tq < r < ri , (59) 
for r > ri 




where, r = ^J~x^^^r]p' and / = (^i — r)/(ri — rg) with Tq = 0.1 and = 0.115. 
/ function helps to reduce initial transients. The gas pressuse p = 1 and 
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magnetic field B^. = 5/ By = are uniform. The third components 
of velocity and magnetic field are set to zero. Adiabatic index 7 = 1.4. 
The density, gas pressure, Mach number and magnetic pressure distributions 
at Time = 0.15 are shown in Fig. [TTl The test is based on the WENO 
scheme with the resolution of 400 x 400 and no approximate Riemann solvers 
included. From this figure we can observe that the torsional Alfven waves are 
generated. The EGLM-MHD equations prevent the magnetic monopoles to 
form and the contours of Mach number keep the shape of concentric circles. 
Other methods (LF and MMC) show almost the same results. 

5.7. 2D magnetic reconnection application 

In this subsection we mainly test the effect of resistivity in the magnetic 
reconnection process. This kind of resistivity may due to some microscopic 
instabilities, although how these instabilities drive the macroscopic reconnec- 
tion is still not clear. The velocities {vx,Vy,Vz) are zero in the initial state. We 
set a uniform density (p = 1) and pressure {p = 0.1) distributions with an 
anti-parallel magnetic configuration (see below) in the computational domain 
([-0.5,0.5] X [-2,2]) as follows: 

5. = , (60) 

for X < —Lr 

By = ■I sm{nx/2Lr) for |a;| < , (61) 



-1 




for 


X ^ — Lj- 


sin 


nx/2Lr) 


for 




1 




for 


X ^ Lrj- 







for 


X ^ 


cos 


[ttx/2Lj.) 


for 









for 


X ^ Ly 



Bz = \ cos(7rx/2Lr) for |x| < , (62) 



where Lr = 0.05 is the half width of the resistivity region in the x-direction. 
The resistivity has the form 77 = ?7o (cos(7rx/0.1) -|- 1) (cos(7r?//0.4) + 1) /4 in 
the small region [—0.05,0.05] x [—0.2,0.2]. The reconnection becomes fast 
when localized resistivity is included. The simulation results are shown in 
Fig. [121 We get a fast magnetic reconnection results by a localized resistivity 



region [43|, |4J] at the center of the domain. The magnetic reconnection 
releases the magnetic energy to heat the matter located at the center of 
the computation box. The upward and downward outflows are accelerated 
to the Alfven speed by the magnetic tension force. This test is based on 
the WENO scheme with the Lax-Friedrichs flux. The effective resolution 
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Mach Number Magnteic Pressure 




-0.4 -0.2 0.0 0.2 0.4 -0.4 -0.2 0.0 0.2 0.4 

X X 



Figure 11: The density, gas pressure, Mach number and magnetic pressure distributions at 
Time = 0.15. There are 30 equally spaced contours between the maximum and minimum 
for each plot. This test is based on the WENO scheme with the resolution of 400 x 400 
and the Lax-Priedrichs flux. 
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is 2048 X 4096 and we can observe some plasmoids formed at the positions 
X = and y = ±0.2. 

In Fig. [131 we can see five variable distributions along the white horizontal 
line in the right panel of Fig. [12] and the reconnection rate as a function 
of time. The variable distributions clearly show the slow mode shock in 
the ranges (—0.03 < x < —0.01 and 0.01 < x < 0.03). The variables 
sharply changed between the upstream and the downstream of this shock. 
The reconnection rate is calculated by Vin/VA, where Vin is the inflow speed 
and Va the Alfven speed. As shown by the lower right of Fig. [131 the 
reconnection rate reaches to the maximum around the time 1.0 and the value 
of this rate is about 0.76 which is in the range of 0.01 — 0.1 as expected by 
Petschek 45 . 




ime=2. 

I 

J 

m 




Figure 12: Gas pressure distributions with velocity arrows and magnetic field lines at time 
1.0, 1.5, 2.0 and 2.5. This test is based on the WENO scheme with the Lax-Friedrichs 
flux. Base resolution is 128 x 256 and the maximum refinement level is 5. 



5.8. 2D thermal conduction test 

In this test, we test the thermal condction effect with the method of subcy- 
cle and without the subcycle. Considering fully ionization plasma (7 = 5/3), 
the thermal conduction can only transfer the energy along the magnetic field 
lines. The conduction term V- (kVT) in the MHD energy equation (Eq. ([8])) 
is changed to the form V- (koT^''^(B ■ VT)B/i?^), where kq is a coefficient for 
thermal conduction (we take Kq = 0.001 in this test). The initial condition is 
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Figure 13: The distributions of gas pressure, magnetic pressure, x-component velocity, 
y-component velocity, density along the whijg line in Fig. [12] at time 2.5. The lower right 
panel is the magnetic reconnection rate as a function of the time. The dashed lines show 
the velocity value of 0. 



very simple. In the computational domain ([—0.5, 0.5] x [—0.5, 0.5]), we have 
uniform distributions: {p,Vx,Vy,Vx, B^, By, Bz,p) = (0.1,0,0,0,1,1,0,0.1). 
Besides, we set a small hot range {p{x,0) = 0.01 in the range |x| < 0.1) 
at the bottom of the boundary and the heat will be transferred along the 
oblique magnetic field lines as shown by Fig. [TH The bottom boundary is 
maintained as its initial value in the total evolution, while the left and right 
boundaries are periodic. Finally, the top boundary is free. 

Fig. [TH shows that the results with and without subcycle almost have 
the same results. Additional comparison is given in Fig. [15] which taken the 
temperature distributions along a vertical line {x = 0.4) for both the upper 
right and the lower right panels. We can see that the shapes of two results 
are identical. We checked the values for both cases and found the difference 
between two case is less than 0.5%. However, the time spent in both cases 
are not in the same order of magnitude. For instance, in this test, the total 
elapsed time is 2225.27 seconds for the subcycle case while 28770.04 seconds 
for the case without subcycle. That is to say, we can get a almost identical 
result in a relatively short time by using the subcycle method. 

5. 9. 3D Rayleigh Taylor instability application 

In the last application we check the performance of our MAP code in the 
3D domain [—0.5, 0.5] x [—0.5, 0.5] x [—1, 1] with a gravity field {g^ = 0, Qy = 0, 

= —1). The heavy gas (p = 3) fills the upper half of the computational 
box {z > 0) while the light gas (p = 1) fills the lower half (^ < 0). A 
velocity perturbation is introduced in the z-direction to trigger instability. 
The mathematical form of the perturbation is Vx = 0, Vy = 0, Vz = 0.01(1 + 
cos(7ra;))(H-cos(7ry))(H-cos(7rz)) in the cubic region [—0.5, 0.5] x [—0.5, 0.5] x 
[—0.5,0.5]. The pressure (p) is changed from 1 at the bottom boundary to 
5 at the top boundary to get the hydrostatic balance. There is no magnetic 
field and the adiabatic index 7 = 7/5. The resuls are shown in Fig. [161 The 
statistics of the time consumption is given in Table [5] for a 128-processor run 
and AMR level L = 4. 

This simulation can test the performance of the source term and the 
symmetry of our code. As we can observe from Fig. [THl the Rayleigh Taylor 
instability is successfully formed and the Kelvin-Helmholtz instability is also 
generated at the interface. The WENO scheme maintains a good symmetrical 
property in this problem. However, as we know, the final structure of this 
Rayleigh Taylor instability test is sensitive to the numerical diffusion of the 
scheme we used. If we use the Lax-Fridrichs flux instead of HLLC soler. 
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Time=0.01 Time=0.12 Time=0.42 




-0.4 -0.2 0.0 02 0.4 -0.4 -0.2 0.0 0.2 0.4 -0.4 -0.2 0.0 0.2 0.4 



Figure 14: Temperature distributions with the magnetic field hues (white lines) at time 
0.01, 0.12 and 0.42. The upper and lower panels show the results with subcycle and without 
subcycle, respectively. This test is based on the WENO scheme with the Lax-Friedrichs 
flux. 
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Figure 15: Comparison of temperature distributions for the test with and without subcycle. 
The distributions are along the hue x = 0.4 at the Time = 0.42 for both cases shown in 
Fig. M 
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the final shape of the interface between the light and heavy fluids will differ 
wildly from the result shown in Fig. Uni Thus, we do not give the results got 
by other methods. 

Table 5: Statistics of the time consumption for the 3D rayleigh Taylor instability. 







128 


Levels 




4 


Effective resolution 


256 


X 256 X 512 


Total elapsed time 


5h 


55m 11.15s 



Regridding, Load Balance 
Integration 
Data output 
Boundary condition 
Other (initial condition, data check, 
calculate minimum At) 



6. Summary 

We have presented an MHD code with adaptive mesh refinement and 
parallelization for astrophysics, named MAP. We use three kind of schemes, 
namely modified Mac Cormack Scheme (MMC), Lax-Fridrichs scheme (LF) 
and weighted essentially non-oscillatory (WENO) scheme, combined with 
TVD limiters and approximate Riemann solvers (HLLC, HLLD, and Roe 
solvers). The divergence free condition is guaranteed by the EGLM-MHD 
equations. As for the AMR parallelization strategy, the conception of framework 
and gene may be discussed with a different names in other codes. Neverthe- 
less, there must be many differences from other existing codes, for instance, 
the treatments of the load balance, boundary condition, etc. It is also useful 
for the readers to understand what is AMR or how to develop an AMR code 
by themselves. 

Our MAP code can be easily applied to the actual problems by simply 
modifying the modules of initial and boundary conditions. For some special 
cases, the resistivity model and the damping region can also be changed. 
We are already using our MAP code for some solar MHD applications, for 
example, magnetic reconnection between emerging flux and the background 
canopy-type magnetic configuration which is aimed to explain the microfiares 



0.46% 
83.64% 
0.05% 
8.590% 

6.96% 
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Figure 16: Simulation result of the 3D rayleigh taylor instability at the time 3.0, 3.5 
and 4.0. Left panels: density distribution displayed by the volume rendering, which is 
visualized by the tool Visit developed by the Department of Energy (DOE) Advanced 
Simulation and Computing Initiative (ASCI). Middle panels: the density distribution on 
the x-z plane at the position y = 0. Right panels: the density distribution on the x-y 
plane at the position z — 0. The test is based on the WENO scheme with the HLLC 
approximate Riemann solver. 
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in the solar chromosphere and corona. These will be discussed in detail in 
future papers. 

This paper only describes the first version of our MAP code. The code 
needs to be further improved in many aspects. Up to now, an improvement 
should be done in the next job is the radiative transfer. As well known, in 
the solar photosphere and chromosphere, the optically-thick radiation is very 
important for the energy transfer. How to implement it into MHD equations 
should be a heavy work and needs a long time to code and test. Optically- 
thin case is relatively easy to include, for instance, one can write a subroutine 
for adding a source term to include the optically-thin transfer. The other 
implements like the mesh geometries in cylindrical and spherical grids will 
be added in the next version. It is emphasized here again that we did not 
take the schemes with the accuracy higher than two orders into account as 
we mentioned in Section [H since many operations in the AMR algorithm and 
the treatment in source terms, for instance, the radiation transfer mentioned 
above, can not reach such a higher accuracy. How to improve the accuracy 
globally is another topic which is out of the scope of this paper. 
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